Skip to content

ARROW-11787: [R] Implement write csv#10141

Closed
thisisnic wants to merge 27 commits into
apache:masterfrom
thisisnic:arrow-11787-write_csv
Closed

ARROW-11787: [R] Implement write csv#10141
thisisnic wants to merge 27 commits into
apache:masterfrom
thisisnic:arrow-11787-write_csv

Conversation

@thisisnic

Copy link
Copy Markdown
Member

No description provided.

@github-actions

Copy link
Copy Markdown

@thisisnic thisisnic force-pushed the arrow-11787-write_csv branch from 3401d83 to 3239575 Compare April 27, 2021 14:13
@thisisnic thisisnic changed the title ARROW-11787: [R] Implement write csv [WIP] ARROW-11787: [R] Implement write csv Apr 27, 2021
@thisisnic thisisnic marked this pull request as ready for review April 27, 2021 14:13
Comment thread r/R/csv.R Outdated
#' @docType class
#' @usage NULL
#' @format NULL
#' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description doesn't look quite correct.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated now

@nealrichardson nealrichardson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. Some suggestions/leading questions.

Comment thread r/R/csv.R Outdated
#' @docType class
#' @usage NULL
#' @format NULL
#' @description `CsvReadOptions`, `CsvParseOptions`, `CsvConvertOptions`,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to documenting this here (and cleaning up the bad copy-paste) would be to document it with CsvReadOptions et al.

Comment thread r/R/csv.R Outdated
#' }
#' @include arrow-package.R
write_csv_arrow <- function(x,
sink,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like indentation is slightly off here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated now

Comment thread r/R/csv.R Outdated
Comment thread r/R/csv.R Outdated
Comment on lines +643 to +644
assert_that(length(include_header) == 1)
assert_that(is.logical(include_header))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you remove these--will the C++ static typing validate this enough?

What happens if include_header = NA?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed those as totally sensible errors from the C++ as you say. If include_header = NA with the assert_that removed, no header is written.

Comment thread r/R/csv.R Outdated
assert_that(length(include_header) == 1)
assert_that(is.logical(include_header))

write_options = CsvWriteOptions$create(include_header, batch_size)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
write_options = CsvWriteOptions$create(include_header, batch_size)
write_options <- CsvWriteOptions$create(include_header, batch_size)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread r/R/csv.R Outdated
x <- Table$create(x)
}

assert_is(x, c("Table", "RecordBatch"))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_is(x, c("Table", "RecordBatch"))
assert_is(x, "ArrowTabular")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread r/tests/testthat/test-csv.R Outdated

})


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add tests for handling bad inputs too. Also might make more sense to put the writing tests at the bottom of the test file instead of the top.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread r/tests/testthat/test-csv.R Outdated

expect_identical(tbl_in, tbl_expected)

skip("Doesn't yet work with date columns due to ARROW-12540")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to test the file-with-dates in every combination of parameters, just the first one is sufficient.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - updated.

Comment thread r/tests/testthat/test-csv.R Outdated
expect_identical(tbl_in, tbl_expected)
})

test_that("Write a CSV file with different batch sizes", {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing? What does batch_size do? It doesn't look like there is an observable difference in the output.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch size dictates how much data is buffered when translating to CSV

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the output will be the same, but what's happening internally will be different. I included it as I wanted to make sure I could pass through the param, but I guess it's C++ functionality. Should I remove the tests for the different batch sizes and just make sure I can pass through the param once?

@nealrichardson nealrichardson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Let's try moving the validation like this, and assuming the tests pass I'll merge (or someone else can)

Comment thread r/R/csv.R Outdated
Comment thread r/R/csv.R
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants